Disentangling Style and Speaker Attributes for TTS Style Transfer
نویسندگان
چکیده
End-to-end neural TTS has shown improved performance in speech style transfer. However, the improvement is still limited by available training data both target styles and speakers. Additionally, degenerated observed when trained tries to transfer a from new speaker with an unknown, arbitrary style. In this paper, we propose approach seen unseen on disjoint, multi-style datasets, i. e., datasets of different are recorded, one individual multiple utterances. An inverse autoregressive flow (IAF) technique first introduced improve variational inference for learning expressive representation. A encoder network then developed discriminative embedding, which jointly rest modules. The proposed effectively six specifically-designed objectives: reconstruction loss, adversarial distortion cycle consistency classification loss. Experiments demonstrate, objectively subjectively, effectiveness tasks. our superior more robust than those four other reference systems prior art.
منابع مشابه
Separating Style and Content for Generalized Style Transfer
Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, m...
متن کاملAdding speaking style to a TTS system
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملDevelopment of a genre-dependent TTS system with cross-speaker speaking-style transplantation
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identifi...
متن کاملArtistic Style Transfer for Videos
In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfe...
متن کاملStereoscopic Neural Style Transfer
This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR. We start with a careful examination of applying existing monocular style transfer methods to left and right views of stereoscopic images separately. This reveals that the original disparity consistency cannot be well preserved in the final stylization result...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2022
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2022.3145297